The objective of the project is to analyze the policing data, the data was taken from kaggle open dataset
source of the data : https://www.kaggle.com/center-for-policing-equity/data-science-for-good.
This report contain several section and each section have their own subsection, for the subsection, the report are ordered by the number of their subsection, each subsection are connected each other, this report also containing the conclusion and several recommendation that can be given to stakeholders (in this case are dallas police department) to improve their services to the society.
Before we do analysis, there are several libraries needed for further analysis, which can be loaded by
#including several libraries needed
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(tidyverse)
library(sf)
library(mapview)
library(stringr)
library(reshape2)
Before we start the project we need to load the data first, we can do that using the command
setwd('/Volumes/HP v212w/Kuliah/Data Visualization') # this to make the working dirrectory
##we can load our data using this command
data <- read.csv('Data.csv',header = TRUE)
##we need to drop our first row (because it contains the column name)
data_fix <- data[2:2384,]
## we need further understanding with our data we need to see our data structure
str(data_fix)
## 'data.frame': 2383 obs. of 47 variables:
## $ INCIDENT_DATE : chr "9/3/16" "3/22/16" "5/22/16" "1/10/16" ...
## $ INCIDENT_TIME : chr "4:14:00 AM" "11:00:00 PM" "1:29:00 PM" "8:55:00 PM" ...
## $ UOF_NUMBER : chr "37702" "33413" "34567" "31460" ...
## $ OFFICER_ID : chr "10810" "7706" "11014" "6692" ...
## $ OFFICER_GENDER : chr "Male" "Male" "Male" "Male" ...
## $ OFFICER_RACE : chr "Black" "White" "Black" "Black" ...
## $ OFFICER_HIRE_DATE : chr "5/7/14" "1/8/99" "5/20/15" "7/29/91" ...
## $ OFFICER_YEARS_ON_FORCE : chr "2" "17" "1" "24" ...
## $ OFFICER_INJURY : chr "No" "Yes" "No" "No" ...
## $ OFFICER_INJURY_TYPE : chr "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" "No injuries noted or visible" ...
## $ OFFICER_HOSPITALIZATION : chr "No" "Yes" "No" "No" ...
## $ SUBJECT_ID : chr "46424" "44324" "45126" "43150" ...
## $ SUBJECT_RACE : chr "Black" "Hispanic" "Hispanic" "Hispanic" ...
## $ SUBJECT_GENDER : chr "Female" "Male" "Male" "Male" ...
## $ SUBJECT_INJURY : chr "Yes" "No" "No" "Yes" ...
## $ SUBJECT_INJURY_TYPE : chr "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" "Laceration/Cut" ...
## $ SUBJECT_WAS_ARRESTED : chr "Yes" "Yes" "Yes" "Yes" ...
## $ SUBJECT_DESCRIPTION : chr "Mentally unstable" "Mentally unstable" "Unknown" "FD-Unknown if Armed" ...
## $ SUBJECT_OFFENSE : chr "APOWW" "APOWW" "APOWW" "Evading Arrest" ...
## $ REPORTING_AREA : chr "2062" "1197" "4153" "4523" ...
## $ BEAT : chr "134" "237" "432" "641" ...
## $ SECTOR : chr "130" "230" "430" "640" ...
## $ DIVISION : chr "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL" ...
## $ LOCATION_DISTRICT : chr "D14" "D9" "D6" "D11" ...
## $ STREET_NUMBER : chr "211" "7647" "716" "5600" ...
## $ STREET_NAME : chr "Ervay" "Ferguson" "bimebella dr" "LBJ" ...
## $ STREET_DIRECTION : chr "N" "NULL" "NULL" "NULL" ...
## $ STREET_TYPE : chr "St." "Rd." "Ln." "Frwy." ...
## $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" "5600 L B J FWY" ...
## $ LOCATION_CITY : chr "Dallas" "Dallas" "Dallas" "Dallas" ...
## $ LOCATION_STATE : chr "TX" "TX" "TX" "TX" ...
## $ LOCATION_LATITUDE : chr "32.782205" "32.798978" "32.73971" "" ...
## $ LOCATION_LONGITUDE : chr "-96.797461" "-96.717493" "-96.92519" "" ...
## $ INCIDENT_REASON : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ REASON_FOR_FORCE : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ TYPE_OF_FORCE_USED1 : chr "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" "K-9 Deployment" ...
## $ TYPE_OF_FORCE_USED2 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED3 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED4 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED5 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED6 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED7 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED8 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED9 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED10 : chr "" "" "" "" ...
## $ NUMBER_EC_CYCLES : chr "NULL" "NULL" "NULL" "NULL" ...
## $ FORCE_EFFECTIVE : chr " Yes" " Yes" " Yes" " Yes" ...
## after that we need to further understanding about our column names
names(data_fix)
## [1] "INCIDENT_DATE"
## [2] "INCIDENT_TIME"
## [3] "UOF_NUMBER"
## [4] "OFFICER_ID"
## [5] "OFFICER_GENDER"
## [6] "OFFICER_RACE"
## [7] "OFFICER_HIRE_DATE"
## [8] "OFFICER_YEARS_ON_FORCE"
## [9] "OFFICER_INJURY"
## [10] "OFFICER_INJURY_TYPE"
## [11] "OFFICER_HOSPITALIZATION"
## [12] "SUBJECT_ID"
## [13] "SUBJECT_RACE"
## [14] "SUBJECT_GENDER"
## [15] "SUBJECT_INJURY"
## [16] "SUBJECT_INJURY_TYPE"
## [17] "SUBJECT_WAS_ARRESTED"
## [18] "SUBJECT_DESCRIPTION"
## [19] "SUBJECT_OFFENSE"
## [20] "REPORTING_AREA"
## [21] "BEAT"
## [22] "SECTOR"
## [23] "DIVISION"
## [24] "LOCATION_DISTRICT"
## [25] "STREET_NUMBER"
## [26] "STREET_NAME"
## [27] "STREET_DIRECTION"
## [28] "STREET_TYPE"
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"
## [31] "LOCATION_STATE"
## [32] "LOCATION_LATITUDE"
## [33] "LOCATION_LONGITUDE"
## [34] "INCIDENT_REASON"
## [35] "REASON_FOR_FORCE"
## [36] "TYPE_OF_FORCE_USED1"
## [37] "TYPE_OF_FORCE_USED2"
## [38] "TYPE_OF_FORCE_USED3"
## [39] "TYPE_OF_FORCE_USED4"
## [40] "TYPE_OF_FORCE_USED5"
## [41] "TYPE_OF_FORCE_USED6"
## [42] "TYPE_OF_FORCE_USED7"
## [43] "TYPE_OF_FORCE_USED8"
## [44] "TYPE_OF_FORCE_USED9"
## [45] "TYPE_OF_FORCE_USED10"
## [46] "NUMBER_EC_CYCLES"
## [47] "FORCE_EFFECTIVE"
findings from here : All of our data are character.
We need to convert several columns in our data set into numeric, to do that, we can use this command
## we need to change several data into integer and string data
data_fix$OFFICER_YEARS_ON_FORCE <- as.numeric(data_fix$OFFICER_YEARS_ON_FORCE)
## we can recheck again using structure again
str(data_fix)
## 'data.frame': 2383 obs. of 47 variables:
## $ INCIDENT_DATE : chr "9/3/16" "3/22/16" "5/22/16" "1/10/16" ...
## $ INCIDENT_TIME : chr "4:14:00 AM" "11:00:00 PM" "1:29:00 PM" "8:55:00 PM" ...
## $ UOF_NUMBER : chr "37702" "33413" "34567" "31460" ...
## $ OFFICER_ID : chr "10810" "7706" "11014" "6692" ...
## $ OFFICER_GENDER : chr "Male" "Male" "Male" "Male" ...
## $ OFFICER_RACE : chr "Black" "White" "Black" "Black" ...
## $ OFFICER_HIRE_DATE : chr "5/7/14" "1/8/99" "5/20/15" "7/29/91" ...
## $ OFFICER_YEARS_ON_FORCE : num 2 17 1 24 7 7 7 9 4 8 ...
## $ OFFICER_INJURY : chr "No" "Yes" "No" "No" ...
## $ OFFICER_INJURY_TYPE : chr "No injuries noted or visible" "Sprain/Strain" "No injuries noted or visible" "No injuries noted or visible" ...
## $ OFFICER_HOSPITALIZATION : chr "No" "Yes" "No" "No" ...
## $ SUBJECT_ID : chr "46424" "44324" "45126" "43150" ...
## $ SUBJECT_RACE : chr "Black" "Hispanic" "Hispanic" "Hispanic" ...
## $ SUBJECT_GENDER : chr "Female" "Male" "Male" "Male" ...
## $ SUBJECT_INJURY : chr "Yes" "No" "No" "Yes" ...
## $ SUBJECT_INJURY_TYPE : chr "Non-Visible Injury/Pain" "No injuries noted or visible" "No injuries noted or visible" "Laceration/Cut" ...
## $ SUBJECT_WAS_ARRESTED : chr "Yes" "Yes" "Yes" "Yes" ...
## $ SUBJECT_DESCRIPTION : chr "Mentally unstable" "Mentally unstable" "Unknown" "FD-Unknown if Armed" ...
## $ SUBJECT_OFFENSE : chr "APOWW" "APOWW" "APOWW" "Evading Arrest" ...
## $ REPORTING_AREA : chr "2062" "1197" "4153" "4523" ...
## $ BEAT : chr "134" "237" "432" "641" ...
## $ SECTOR : chr "130" "230" "430" "640" ...
## $ DIVISION : chr "CENTRAL" "NORTHEAST" "SOUTHWEST" "NORTH CENTRAL" ...
## $ LOCATION_DISTRICT : chr "D14" "D9" "D6" "D11" ...
## $ STREET_NUMBER : chr "211" "7647" "716" "5600" ...
## $ STREET_NAME : chr "Ervay" "Ferguson" "bimebella dr" "LBJ" ...
## $ STREET_DIRECTION : chr "N" "NULL" "NULL" "NULL" ...
## $ STREET_TYPE : chr "St." "Rd." "Ln." "Frwy." ...
## $ LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION: chr "211 N ERVAY ST" "7647 FERGUSON RD" "716 BIMEBELLA LN" "5600 L B J FWY" ...
## $ LOCATION_CITY : chr "Dallas" "Dallas" "Dallas" "Dallas" ...
## $ LOCATION_STATE : chr "TX" "TX" "TX" "TX" ...
## $ LOCATION_LATITUDE : chr "32.782205" "32.798978" "32.73971" "" ...
## $ LOCATION_LONGITUDE : chr "-96.797461" "-96.717493" "-96.92519" "" ...
## $ INCIDENT_REASON : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ REASON_FOR_FORCE : chr "Arrest" "Arrest" "Arrest" "Arrest" ...
## $ TYPE_OF_FORCE_USED1 : chr "Hand/Arm/Elbow Strike" "Joint Locks" "Take Down - Group" "K-9 Deployment" ...
## $ TYPE_OF_FORCE_USED2 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED3 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED4 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED5 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED6 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED7 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED8 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED9 : chr "" "" "" "" ...
## $ TYPE_OF_FORCE_USED10 : chr "" "" "" "" ...
## $ NUMBER_EC_CYCLES : chr "NULL" "NULL" "NULL" "NULL" ...
## $ FORCE_EFFECTIVE : chr " Yes" " Yes" " Yes" " Yes" ...
first, we would like to see the major trends of accident data by dates, we cann use command below
#parsed the data to collect only the month
time_data_date <- format(as.Date(data_fix$INCIDENT_DATE, "%m/%d/%Y","%m"))
data_month <- month(time_data_date)
monthly <- data.frame(table(data_month))
#visualize the data
ggplot(monthly,aes(x=data_month,y=Freq,group=1)) + geom_line(aes(x=data_month,y=Freq)) + geom_point() +
xlab("Month") + ylab("Number of accident") + ggtitle("Accident trend throughout 2016") +
theme(plot.title = element_text(hjust = 0.5)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
After we see the trend, we would like too perform deeper analysis, we would like to see the officers that involved i the accident, we want to see the distribution first by using the boxplot below
#visualize the boxplot
boxplot(data_fix$OFFICER_YEARS_ON_FORCE,ylab='Officer Experience (Years)')
To get a broaden approach or a clearer perspective we would like to visualize on using histogram below
summary(data_fix$OFFICER_YEARS_ON_FORCE)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.000 6.000 8.049 10.000 36.000
### we will make a histogram plots using this command
hist(data_fix$OFFICER_YEARS_ON_FORCE,xlim=range(0,39),breaks =25
,col='red',xlab='Officer Experience',ylab='Number of officer',
main="Distribution Accident Related to Officer's Experience")
is there any correlation between those? are more experienced officer are tends to not injured compared to experienced officer? we want to detect if there any correlation between officer experience and hospitalization, we can perform the correlation analysis below
#do several one zero coding into categorical data
data_fix$Nilai_Hospital<-ifelse(data_fix$OFFICER_HOSPITALIZATION=="Yes",1,0)
data_fix$Nilai_Injury<-ifelse(data_fix$OFFICER_INJURY=="Yes",1,0)
data_fix$Nilai_Arrest_Suspect<-ifelse(data_fix$SUBJECT_WAS_ARRESTED=="Yes",1,0)
data_fix$Nilai_Injury_Suspect<-ifelse(data_fix$SUBJECT_INJURY=="Yes",1,0)
data_korelasi <- round(cor(data_fix[, unlist(lapply(data_fix, is.numeric))]),digits = 2)
#melt the data frame
melted_korelasi <- melt(data_korelasi)
#create correlation heatmap
ggplot(data = melted_korelasi, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
geom_text(aes(Var2, Var1, label = value), size = 5) +
scale_fill_gradient2(low = "blue", high = "red",
limit = c(-1,1), name="Correlation") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.background = element_blank())
After we know about the officer that involved, we would like to see the time of accident happened and also want to see is there any time that have more accident than others
### we need to extract the date from time column
time_data_hour <- parse_date_time(data_fix$INCIDENT_TIME, "%H:%M:%S %p")
## Warning: 10 failed to parse.
### we found 10 null value, but it is not significant into analysis
### we need to extract only hours in the data
data_hour <- na.omit(hour(time_data_hour))
### after that we can also plot our data using histogram plot
fig <- plot_ly(x = data_hour, type = "histogram",nbinsx=70)%>%
layout(autosize=T,title='Distribution Relation Accident and Hours',
xaxis=list(title='Hours'),yaxis=list(title='Number of accident'))
fig
table(data_hour) ##### we can also look out quantitatively using table
## data_hour
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## 142 115 152 101 59 52 21 20 36 49 70 51 76 97 54 70 118 179 165 147
## 20 21 22 23
## 181 159 129 130
to determine the recommendation to the stakeholders, we will see the distribution of gender in officer (we need to use log scale) we need different scale, because there is a huge gap in the data
pd <-table(data_fix$OFFICER_GENDER)
#visualising using bar plot the categorical data
ggplot(data=data_fix,aes(x=OFFICER_GENDER)) + geom_bar(fill='blue')+
xlab('Officer Gender') + ylab('Number of officer') +
ggtitle("Number of Accident by gender") + ggtitle("Number of accident by gender")+
theme(plot.title = element_text(hjust = 0.5)) + scale_y_continuous(trans='log2')
After getting insight from the gender, it can be sharpen into the race of the officer, is there any specific race that involved in the accident
#filtering the data using pipe
data_race <- data.frame(data_fix[data_fix$OFFICER_RACE!='Other',] %>% group_by(OFFICER_GENDER) %>% count(OFFICER_RACE))
data_sort <- data_race[order(data_race$OFFICER_GENDER,data_race$n),]
#visualize the data using interactive plot such as plotly
plot_ly(data = data_sort,x = data_race$OFFICER_GENDER,y = data_race$n,
color = data_race$OFFICER_RACE,
type = "bar"
) %>%
layout(barmode = "stack",xaxis=list(title='Gender'),
title='Officer Gender and race',
yaxis = list(title='Number of officers',type = "log"))
pd <- data.frame(count(data_fix,data_fix$OFFICER_RACE))
ki <- pd %>% rename(race = data_fix.OFFICER_RACE)
plot_ly(data = ki,labels=~ki$race,values =~ki$n,type ="pie") %>%
layout(title="Composition of accident by race",xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE))
we also want to see how many officer are injured and need hospitalization
#visualize bar plot
ggplot(data=data_fix,aes(x=OFFICER_INJURY)) + geom_bar(fill='blue')+
xlab('Officer Condition') + ylab('Number of officer') +
ggtitle("Number of injured officer")+
theme(plot.title = element_text(hjust = 0.5)) + scale_y_continuous(trans='log2')
#visualize bar plot
ggplot(data=data_fix,aes(x=OFFICER_HOSPITALIZATION)) + geom_bar(fill='blue')+
xlab('Officer Condition') + ylab('Number of officer') +
ggtitle("Number of hospitalized officer")+
theme(plot.title = element_text(hjust = 0.5)) + scale_y_continuous(trans='log2')
After we see the officer point of view, we would like to see in the data, subject gender and race
#create the subset data using dplyr
data_race_subject <- data.frame(filter(data_fix,SUBJECT_GENDER!="Unknown" &
SUBJECT_GENDER != "NULL" &
SUBJECT_RACE!="NULL" & SUBJECT_RACE != "Other") %>%
group_by(SUBJECT_GENDER) %>% count(SUBJECT_RACE))
data_sort_subject <- data_race_subject[order(data_race_subject$SUBJECT_GENDER,data_race_subject$n,decreasing=FALSE),]
row.names(data_sort_subject) <- NULL
plot_ly(data = data_sort_subject,x = data_sort_subject$SUBJECT_GENDER,y = data_sort_subject$n,
color = data_sort_subject$SUBJECT_RACE,
type = "bar"
) %>%
layout(barmode = "stack",xaxis=list(title='Gender'),
title='Subject Gender and race',
yaxis = list(title='Number of officers'))
we would like to see rather the subject are injured or not
#create the subset data
data_subject_injured <- data.frame(data_fix[data_fix$'SUBJECT_GENDER'!='NULL' & data_fix$'SUBJECT_GENDER'!='Unknown', ] %>%
count(SUBJECT_INJURY))
data_sub_injured_sort <- data_subject_injured[order(data_subject_injured$SUBJECT_INJURY,data_subject_injured$n),]
#visualize the data using plotly
plot_ly(data = data_subject_injured,labels=~data_subject_injured$SUBJECT_INJURY,values =~data_subject_injured$n,type ="pie") %>%
layout(title="Composition of accident by Subject injured",xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE))
After that, we need to know that in terms of location,we would like to see by the location where the incident happen
#create the subset data
data_location <- data.frame(data_fix
%>% group_by(data_fix$DIVISION) %>%
count(data_fix$DIVISION))
names(data_location)
## [1] "data_fix.DIVISION" "n"
data_sort<- data_location[order(data_location$n),]
#visualize the data
plot_ly(data = data_sort,x = data_sort$data_fix.DIVISION,y = data_sort$n,
type = "bar") %>%
layout(title ='Number of Accident by Division',
xaxis = list(categoryorder = "total descending",title='Division'),yaxis=list(title='Number of Accident'))
data_district <- data.frame(data_fix
%>% group_by(data_fix$LOCATION_DISTRICT) %>%
count(data_fix$LOCATION_DISTRICT))
plot_ly(data = data_district,x = data_district$data_fix.LOCATION_DISTRICT,y = data_district$n,
type = "bar") %>%
layout(title ='Number of Accident by District',
xaxis = list(categoryorder = "total descending",title='District'),yaxis=list(title='Number of Accident'))
we would like to visualize the maps data from latitude and also the longitude data
#create subset data for mapping
data_lat <- as.double(data_fix$LOCATION_LATITUDE)
data_long <- as.double(data_fix$LOCATION_LONGITUDE)
data_lat_long <- data.frame(data_lat,data_long,data_fix$DIVISION,data_fix$OFFICER_ID ,data_fix$OFFICER_GENDER,data_fix$OFFICER_INJURY,data_fix$OFFICER_HOSPITALIZATION, data_fix$SUBJECT_WAS_ARRESTED,data_fix$SUBJECT_GENDER)
data_filter_lat_long <- na.omit(data_lat_long)
names(data_filter_lat_long)
## [1] "data_lat" "data_long"
## [3] "data_fix.DIVISION" "data_fix.OFFICER_ID"
## [5] "data_fix.OFFICER_GENDER" "data_fix.OFFICER_INJURY"
## [7] "data_fix.OFFICER_HOSPITALIZATION" "data_fix.SUBJECT_WAS_ARRESTED"
## [9] "data_fix.SUBJECT_GENDER"
#create mapping data
mapview(data_filter_lat_long, xcol = "data_long", ycol = "data_lat",zcol=c("data_fix.DIVISION","data_fix.DIVISION"),crs = 4269, grid = FALSE,map.types = "Stamen.Toner")
summary(data_filter_lat_long)
## data_lat data_long data_fix.DIVISION data_fix.OFFICER_ID
## Min. :32.63 Min. :-96.96 Length:2328 Length:2328
## 1st Qu.:32.74 1st Qu.:-96.82 Class :character Class :character
## Median :32.78 Median :-96.79 Mode :character Mode :character
## Mean :32.80 Mean :-96.78
## 3rd Qu.:32.86 3rd Qu.:-96.75
## Max. :33.02 Max. :-96.57
## data_fix.OFFICER_GENDER data_fix.OFFICER_INJURY
## Length:2328 Length:2328
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## data_fix.OFFICER_HOSPITALIZATION data_fix.SUBJECT_WAS_ARRESTED
## Length:2328 Length:2328
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## data_fix.SUBJECT_GENDER
## Length:2328
## Class :character
## Mode :character
##
##
##
hist(data_lat,breaks=100,xlab ="Latitude",main ="Distribution from latitude")
abline(v=32.78,col='red')
hist(data_long,breaks=100,xlab='Longitude',main ="Distribution from longitude")
abline(v=-96.79,col='red')
we would like to know if there any majority reason for the report which causes the accident
unique(data_fix$INCIDENT_REASON)
## [1] "Arrest" "Service Call" "Suspicious Activity"
## [4] "Traffic Stop" "Other ( In Narrative)" "Crime in Progress"
## [7] "Warrant Execution" "Call for Cover" "Off-Duty Employment"
## [10] "Pedestrian Stop" "NULL" "Off-Duty Incident"
## [13] "Crowd Control" "Accidental Discharge"
data_reason <- data.frame(data_fix %>% group_by(INCIDENT_REASON) %>% count(INCIDENT_REASON))
### we need to extract null values from here which can be done with this command
data_reason <- data_reason[data_reason$INCIDENT_REASON != "NULL",]
names(data_reason)
## [1] "INCIDENT_REASON" "n"
plot_ly(data = data_reason,labels=~data_reason$INCIDENT_REASON,values =~data_reason$n,type ="pie") %>%
layout(title="Composition of accident by reason",xaxis = list(showgrid = FALSE, zeroline = FALSE,
showticklabels = TRUE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = TRUE))
The data was 2016 Dallas Accident on policing duties, there are several conclusion that can be taken from the data, first, the department are succesfully decrease the accident throughout 2016, second, there are several area that need to consider to prevent another accident in the future, third, there are several things that can be done by the department to prevent further accident to happened, this will be covered on the recommendation.
From the data there are several recommendation that can be given to the stakeholders the recommendation are: